A Mandarin Voice Organizer Based on a Template-Matching Speech Recognizer
نویسندگان
چکیده
On the observation of current available voice organizers, all of them accept only voice commands or word-based commands. Using natural spoken language to operate organizer is still a difficult problem. In this paper, a template-based speech recognizer which accepts near(constrained) spoken language is proposed. Since the template-based recognizer is a domain-dependent speech recognition system, representing and matching of sentence templates become the main tasks of the recognizer. We use finate state networks(FSNs) to represent the sentence templates and propose a vowel-based, syllable-scoring method to match a correct template. By replacing the template sets, this method can be easily applied to other domains. Besides, two main functions, voice recording and voice message query, are implemented on our organizer using a fast CELP encoder/decoder to compress/decompress the voice data in realtime. Experimental results shows that the collected 31 sentence templates can greatly improve the voice interface between the user and the voice organizer.
منابع مشابه
Very-large-vocabulary Mandarin voice message file retrieval using speech queries
In order to solve the problem with the new environment of fast growth of audio resources on the Internet, this paper presents a new approach which is capable of retrieving Mandarin voice message files using queries of unconstrained speech. By properly utilizing the monosyllabic structure of the Chinese language, the proposed approach performs the statistical similarity estimation between the sp...
متن کامل【the Invention】 [discrete Wavelet Transform Based Multiple Template-matching for Speech Recognition]
【Abstract】 This invention is method of speech recognition that consists of plural overlapped templatematching (TM). In order to economize calculations of TM, the data of low resolution given by Haar discrete wavelet transform (HDWT) is used. Templates of wavelet coefficient (WC) on waveform are used for recognition of phoneme. A sum of WC in a scale (SWC) corresponds to a frequency component on...
متن کاملIntelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST)
1 Abstract Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment....
متن کاملUsing English Phoneme Models for Chinese Speech Recognition
To build a speech recognizer, database design, collection and transcription is the most time consuming and tedious job. This paper proposes some fast and easy methods to use English phoneme models for Mandarin and Cantonese speech recognition with little to no training data in Mandarin and Cantonese. While a recognizer built with such transformed models might not perform as ideally as one that ...
متن کاملAn Image-Based Trainable Symbol Recognizer for Sketch-Based Interfaces
We describe a trainable, hand-drawn symbol recognizer based on a multi-layer recognition scheme. Symbols are internally represented as binary templates. An ensemble of four template classifiers ranks each definition according to similarity with an unknown symbol. Scores from the individual classifiers are then aggregated to determine the best definition for the unknown. Ordinarily, template-mat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996